Overview

Dataset statistics

Number of variables20
Number of observations455495
Missing cells143894
Missing cells (%)1.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory69.5 MiB
Average record size in memory160.0 B

Variable types

Numeric9
Categorical11

Warnings

Type is highly correlated with StayHigh correlation
Stay is highly correlated with TypeHigh correlation
City_Code_Patient has 6689 (1.5%) missing values Missing
Stay has 137057 (30.1%) missing values Missing
case_id is uniformly distributed Uniform
case_id has unique values Unique

Reproduction

Analysis started2021-04-05 16:15:24.580725
Analysis finished2021-04-05 16:16:39.011706
Duration1 minute and 14.43 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

Distinct318438
Distinct (%)69.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean131930.0164
Minimum0
Maximum318437
Zeros2
Zeros (%)< 0.1%
Memory size3.5 MiB
2021-04-05T21:46:39.211064image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile11387
Q156936.5
median113873
Q3204563.5
95-th percentile295662.3
Maximum318437
Range318437
Interquartile range (IQR)147627

Descriptive statistics

Standard deviation90048.67961
Coefficient of variation (CV)0.6825488399
Kurtosis-0.9591527657
Mean131930.0164
Median Absolute Deviation (MAD)68188
Skewness0.454073103
Sum6.00934628 × 1010
Variance8108764699
MonotocityNot monotonic
2021-04-05T21:46:39.324262image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02
 
< 0.1%
1056592
 
< 0.1%
1199862
 
< 0.1%
1220352
 
< 0.1%
1240842
 
< 0.1%
1261332
 
< 0.1%
1281822
 
< 0.1%
1302312
 
< 0.1%
995122
 
< 0.1%
1015612
 
< 0.1%
Other values (318428)455475
> 99.9%
ValueCountFrequency (%)
02
< 0.1%
12
< 0.1%
22
< 0.1%
32
< 0.1%
42
< 0.1%
ValueCountFrequency (%)
3184371
< 0.1%
3184361
< 0.1%
3184351
< 0.1%
3184341
< 0.1%
3184331
< 0.1%

case_id
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct455495
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean227748
Minimum1
Maximum455495
Zeros0
Zeros (%)0.0%
Memory size3.5 MiB
2021-04-05T21:46:39.604294image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile22775.7
Q1113874.5
median227748
Q3341621.5
95-th percentile432720.3
Maximum455495
Range455494
Interquartile range (IQR)227747

Descriptive statistics

Standard deviation131490.2248
Coefficient of variation (CV)0.5773496354
Kurtosis-1.2
Mean227748
Median Absolute Deviation (MAD)113874
Skewness-1.170820996 × 1015
Sum1.037380753 × 1011
Variance1.728967921 × 1010
MonotocityStrictly increasing
2021-04-05T21:46:39.876588image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20491
 
< 0.1%
2383691
 
< 0.1%
2486201
 
< 0.1%
2588591
 
< 0.1%
2609061
 
< 0.1%
2547611
 
< 0.1%
2568081
 
< 0.1%
2342791
 
< 0.1%
2363261
 
< 0.1%
2301811
 
< 0.1%
Other values (455485)455485
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
ValueCountFrequency (%)
4554951
< 0.1%
4554941
< 0.1%
4554931
< 0.1%
4554921
< 0.1%
4554911
< 0.1%

Hospital_code
Real number (ℝ≥0)

Distinct32
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.32633509
Minimum1
Maximum32
Zeros0
Zeros (%)0.0%
Memory size3.5 MiB
2021-04-05T21:46:39.980346image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q111
median19
Q326
95-th percentile30
Maximum32
Range31
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.63403567
Coefficient of variation (CV)0.4711272401
Kurtosis-1.138349673
Mean18.32633509
Median Absolute Deviation (MAD)7
Skewness-0.2820898336
Sum8347554
Variance74.54657196
MonotocityNot monotonic
2021-04-05T21:46:40.084345image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
2647523
 
10.4%
2338220
 
8.4%
1930036
 
6.6%
629221
 
6.4%
1124827
 
5.5%
1424715
 
5.4%
2824572
 
5.4%
2720243
 
4.4%
916360
 
3.6%
1216170
 
3.5%
Other values (22)183608
40.3%
ValueCountFrequency (%)
17460
1.6%
27277
1.6%
310277
2.3%
41749
 
0.4%
57448
1.6%
ValueCountFrequency (%)
3215252
3.3%
315740
 
1.3%
307215
 
1.6%
2916158
3.5%
2824572
5.4%
Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.5 MiB
a
204730 
b
98884 
c
66147 
e
35428 
d
29048 
Other values (2)
21258 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters455495
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowc
2nd rowc
3rd rowe
4th rowb
5th rowb
ValueCountFrequency (%)
a204730
44.9%
b98884
21.7%
c66147
 
14.5%
e35428
 
7.8%
d29048
 
6.4%
f15252
 
3.3%
g6006
 
1.3%
2021-04-05T21:46:40.300550image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-04-05T21:46:40.364376image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
a204730
44.9%
b98884
21.7%
c66147
 
14.5%
e35428
 
7.8%
d29048
 
6.4%
f15252
 
3.3%
g6006
 
1.3%

Most occurring characters

ValueCountFrequency (%)
a204730
44.9%
b98884
21.7%
c66147
 
14.5%
e35428
 
7.8%
d29048
 
6.4%
f15252
 
3.3%
g6006
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter455495
100.0%

Most frequent character per category

ValueCountFrequency (%)
a204730
44.9%
b98884
21.7%
c66147
 
14.5%
e35428
 
7.8%
d29048
 
6.4%
f15252
 
3.3%
g6006
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
Latin455495
100.0%

Most frequent character per script

ValueCountFrequency (%)
a204730
44.9%
b98884
21.7%
c66147
 
14.5%
e35428
 
7.8%
d29048
 
6.4%
f15252
 
3.3%
g6006
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII455495
100.0%

Most frequent character per block

ValueCountFrequency (%)
a204730
44.9%
b98884
21.7%
c66147
 
14.5%
e35428
 
7.8%
d29048
 
6.4%
f15252
 
3.3%
g6006
 
1.3%

City_Code_Hospital
Real number (ℝ≥0)

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.767797671
Minimum1
Maximum13
Zeros0
Zeros (%)0.0%
Memory size3.5 MiB
2021-04-05T21:46:40.476390image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median5
Q37
95-th percentile11
Maximum13
Range12
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.102450222
Coefficient of variation (CV)0.6507092869
Kurtosis-0.6106581387
Mean4.767797671
Median Absolute Deviation (MAD)3
Skewness0.5434124607
Sum2171708
Variance9.625197382
MonotocityNot monotonic
2021-04-05T21:46:40.572398image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
179058
17.4%
274312
16.3%
667441
14.8%
750279
11.0%
345544
10.0%
544395
9.7%
937428
8.2%
1124572
 
5.4%
419778
 
4.3%
107460
 
1.6%
ValueCountFrequency (%)
179058
17.4%
274312
16.3%
345544
10.0%
419778
 
4.3%
544395
9.7%
ValueCountFrequency (%)
135228
 
1.1%
1124572
5.4%
107460
 
1.6%
937428
8.2%
750279
11.0%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.5 MiB
X
190849 
Y
174707 
Z
89939 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters455495
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowZ
2nd rowZ
3rd rowX
4th rowY
5th rowY
ValueCountFrequency (%)
X190849
41.9%
Y174707
38.4%
Z89939
19.7%
2021-04-05T21:46:40.772440image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-04-05T21:46:40.836446image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
x190849
41.9%
y174707
38.4%
z89939
19.7%

Most occurring characters

ValueCountFrequency (%)
X190849
41.9%
Y174707
38.4%
Z89939
19.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter455495
100.0%

Most frequent character per category

ValueCountFrequency (%)
X190849
41.9%
Y174707
38.4%
Z89939
19.7%

Most occurring scripts

ValueCountFrequency (%)
Latin455495
100.0%

Most frequent character per script

ValueCountFrequency (%)
X190849
41.9%
Y174707
38.4%
Z89939
19.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII455495
100.0%

Most frequent character per block

ValueCountFrequency (%)
X190849
41.9%
Y174707
38.4%
Z89939
19.7%
Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.196140463
Minimum0
Maximum24
Zeros22
Zeros (%)< 0.1%
Memory size3.5 MiB
2021-04-05T21:46:40.908437image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q12
median3
Q34
95-th percentile5
Maximum24
Range24
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.166993742
Coefficient of variation (CV)0.3651259247
Kurtosis2.549561887
Mean3.196140463
Median Absolute Deviation (MAD)1
Skewness0.9589704305
Sum1455826
Variance1.361874393
MonotocityNot monotonic
2021-04-05T21:46:41.012446image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
2140895
30.9%
4131191
28.8%
3130755
28.7%
527602
 
6.1%
611003
 
2.4%
17984
 
1.8%
74107
 
0.9%
81468
 
0.3%
9327
 
0.1%
1089
 
< 0.1%
Other values (8)74
 
< 0.1%
ValueCountFrequency (%)
022
 
< 0.1%
17984
 
1.8%
2140895
30.9%
3130755
28.7%
4131191
28.8%
ValueCountFrequency (%)
241
 
< 0.1%
214
< 0.1%
202
< 0.1%
141
 
< 0.1%
133
< 0.1%

Department
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.5 MiB
gynecology
356688 
anesthesia
42358 
radiotherapy
41033 
TB & Chest disease
 
13751
surgery
 
1665

Length

Max length18
Median length10
Mean length10.41071581
Min length7

Characters and Unicode

Total characters4742029
Distinct characters21
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowradiotherapy
2nd rowradiotherapy
3rd rowanesthesia
4th rowradiotherapy
5th rowradiotherapy
ValueCountFrequency (%)
gynecology356688
78.3%
anesthesia42358
 
9.3%
radiotherapy41033
 
9.0%
TB & Chest disease13751
 
3.0%
surgery1665
 
0.4%
2021-04-05T21:46:41.236474image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-04-05T21:46:41.308492image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
gynecology356688
71.8%
anesthesia42358
 
8.5%
radiotherapy41033
 
8.3%
13751
 
2.8%
disease13751
 
2.8%
tb13751
 
2.8%
chest13751
 
2.8%
surgery1665
 
0.3%

Most occurring characters

ValueCountFrequency (%)
y756074
15.9%
o754409
15.9%
g715041
15.1%
e525355
11.1%
n399046
8.4%
c356688
7.5%
l356688
7.5%
a180533
 
3.8%
s127634
 
2.7%
i97142
 
2.0%
Other values (11)473419
10.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4645772
98.0%
Uppercase Letter41253
 
0.9%
Space Separator41253
 
0.9%
Other Punctuation13751
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
y756074
16.3%
o754409
16.2%
g715041
15.4%
e525355
11.3%
n399046
8.6%
c356688
7.7%
l356688
7.7%
a180533
 
3.9%
s127634
 
2.7%
i97142
 
2.1%
Other values (6)377162
8.1%
ValueCountFrequency (%)
T13751
33.3%
B13751
33.3%
C13751
33.3%
ValueCountFrequency (%)
41253
100.0%
ValueCountFrequency (%)
&13751
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4687025
98.8%
Common55004
 
1.2%

Most frequent character per script

ValueCountFrequency (%)
y756074
16.1%
o754409
16.1%
g715041
15.3%
e525355
11.2%
n399046
8.5%
c356688
7.6%
l356688
7.6%
a180533
 
3.9%
s127634
 
2.7%
i97142
 
2.1%
Other values (9)418415
8.9%
ValueCountFrequency (%)
41253
75.0%
&13751
 
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4742029
100.0%

Most frequent character per block

ValueCountFrequency (%)
y756074
15.9%
o754409
15.9%
g715041
15.1%
e525355
11.1%
n399046
8.4%
c356688
7.5%
l356688
7.5%
a180533
 
3.8%
s127634
 
2.7%
i97142
 
2.0%
Other values (11)473419
10.0%

Ward_Type
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.5 MiB
R
182939 
Q
152046 
S
111166 
P
 
7199
T
 
2133

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters455495
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowR
2nd rowS
3rd rowS
4th rowR
5th rowS
ValueCountFrequency (%)
R182939
40.2%
Q152046
33.4%
S111166
24.4%
P7199
 
1.6%
T2133
 
0.5%
U12
 
< 0.1%
2021-04-05T21:46:41.524523image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-04-05T21:46:41.588529image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
r182939
40.2%
q152046
33.4%
s111166
24.4%
p7199
 
1.6%
t2133
 
0.5%
u12
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
R182939
40.2%
Q152046
33.4%
S111166
24.4%
P7199
 
1.6%
T2133
 
0.5%
U12
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter455495
100.0%

Most frequent character per category

ValueCountFrequency (%)
R182939
40.2%
Q152046
33.4%
S111166
24.4%
P7199
 
1.6%
T2133
 
0.5%
U12
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin455495
100.0%

Most frequent character per script

ValueCountFrequency (%)
R182939
40.2%
Q152046
33.4%
S111166
24.4%
P7199
 
1.6%
T2133
 
0.5%
U12
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII455495
100.0%

Most frequent character per block

ValueCountFrequency (%)
R182939
40.2%
Q152046
33.4%
S111166
24.4%
P7199
 
1.6%
T2133
 
0.5%
U12
 
< 0.1%
Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.5 MiB
F
161470 
E
79058 
D
74312 
C
50279 
B
50116 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters455495
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowF
3rd rowE
4th rowD
5th rowD
ValueCountFrequency (%)
F161470
35.4%
E79058
17.4%
D74312
16.3%
C50279
 
11.0%
B50116
 
11.0%
A40260
 
8.8%
2021-04-05T21:46:41.806290image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-04-05T21:46:41.870013image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
f161470
35.4%
e79058
17.4%
d74312
16.3%
c50279
 
11.0%
b50116
 
11.0%
a40260
 
8.8%

Most occurring characters

ValueCountFrequency (%)
F161470
35.4%
E79058
17.4%
D74312
16.3%
C50279
 
11.0%
B50116
 
11.0%
A40260
 
8.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter455495
100.0%

Most frequent character per category

ValueCountFrequency (%)
F161470
35.4%
E79058
17.4%
D74312
16.3%
C50279
 
11.0%
B50116
 
11.0%
A40260
 
8.8%

Most occurring scripts

ValueCountFrequency (%)
Latin455495
100.0%

Most frequent character per script

ValueCountFrequency (%)
F161470
35.4%
E79058
17.4%
D74312
16.3%
C50279
 
11.0%
B50116
 
11.0%
A40260
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII455495
100.0%

Most frequent character per block

ValueCountFrequency (%)
F161470
35.4%
E79058
17.4%
D74312
16.3%
C50279
 
11.0%
B50116
 
11.0%
A40260
 
8.8%

Bed Grade
Categorical

Distinct4
Distinct (%)< 0.1%
Missing148
Missing (%)< 0.1%
Memory size3.5 MiB
2.0
176451 
3.0
158942 
4.0
82387 
1.0
37567 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1366041
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0
ValueCountFrequency (%)
2.0176451
38.7%
3.0158942
34.9%
4.082387
18.1%
1.037567
 
8.2%
(Missing)148
 
< 0.1%
2021-04-05T21:46:42.115708image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-04-05T21:46:42.179734image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
2.0176451
38.8%
3.0158942
34.9%
4.082387
18.1%
1.037567
 
8.3%

Most occurring characters

ValueCountFrequency (%)
.455347
33.3%
0455347
33.3%
2176451
 
12.9%
3158942
 
11.6%
482387
 
6.0%
137567
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number910694
66.7%
Other Punctuation455347
33.3%

Most frequent character per category

ValueCountFrequency (%)
0455347
50.0%
2176451
 
19.4%
3158942
 
17.5%
482387
 
9.0%
137567
 
4.1%
ValueCountFrequency (%)
.455347
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1366041
100.0%

Most frequent character per script

ValueCountFrequency (%)
.455347
33.3%
0455347
33.3%
2176451
 
12.9%
3158942
 
11.6%
482387
 
6.0%
137567
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII1366041
100.0%

Most frequent character per block

ValueCountFrequency (%)
.455347
33.3%
0455347
33.3%
2176451
 
12.9%
3158942
 
11.6%
482387
 
6.0%
137567
 
2.8%

patientid
Real number (ℝ≥0)

Distinct131624
Distinct (%)28.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean65786.79356
Minimum1
Maximum131624
Zeros0
Zeros (%)0.0%
Memory size3.5 MiB
2021-04-05T21:46:42.316894image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6607
Q132874
median65735
Q398576.5
95-th percentile125071.3
Maximum131624
Range131623
Interquartile range (IQR)65702.5

Descriptive statistics

Standard deviation37968.83085
Coefficient of variation (CV)0.5771497408
Kurtosis-1.197556563
Mean65786.79356
Median Absolute Deviation (MAD)32852
Skewness0.003135672615
Sum2.996555553 × 1010
Variance1441632116
MonotocityNot monotonic
2021-04-05T21:46:42.442008image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6671450
 
< 0.1%
9129243
 
< 0.1%
3852539
 
< 0.1%
11445637
 
< 0.1%
10135936
 
< 0.1%
3349134
 
< 0.1%
3288632
 
< 0.1%
664531
 
< 0.1%
9964430
 
< 0.1%
3120330
 
< 0.1%
Other values (131614)455133
99.9%
ValueCountFrequency (%)
14
< 0.1%
22
 
< 0.1%
34
< 0.1%
42
 
< 0.1%
57
< 0.1%
ValueCountFrequency (%)
1316243
 
< 0.1%
1316232
 
< 0.1%
1316224
< 0.1%
1316213
 
< 0.1%
1316209
< 0.1%

City_Code_Patient
Real number (ℝ≥0)

MISSING

Distinct37
Distinct (%)< 0.1%
Missing6689
Missing (%)1.5%
Infinite0
Infinite (%)0.0%
Mean7.249495328
Minimum1
Maximum38
Zeros0
Zeros (%)0.0%
Memory size3.5 MiB
2021-04-05T21:46:42.555916image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q14
median8
Q38
95-th percentile16
Maximum38
Range37
Interquartile range (IQR)4

Descriptive statistics

Standard deviation4.758940953
Coefficient of variation (CV)0.6564513442
Kurtosis4.516135526
Mean7.249495328
Median Absolute Deviation (MAD)1
Skewness1.601060026
Sum3253617
Variance22.64751899
MonotocityNot monotonic
2021-04-05T21:46:42.659770image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
8176825
38.8%
255681
 
12.2%
137772
 
8.3%
733958
 
7.5%
528978
 
6.4%
422044
 
4.8%
916692
 
3.7%
1512804
 
2.8%
1011809
 
2.6%
68723
 
1.9%
Other values (27)43520
 
9.6%
ValueCountFrequency (%)
137772
8.3%
255681
12.2%
35401
 
1.2%
422044
 
4.8%
528978
6.4%
ValueCountFrequency (%)
3818
 
< 0.1%
3778
< 0.1%
3629
 
< 0.1%
3530
 
< 0.1%
3496
< 0.1%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.5 MiB
Trauma
217672 
Emergency
168363 
Urgent
69460 

Length

Max length9
Median length6
Mean length7.108879351
Min length6

Characters and Unicode

Total characters3238059
Distinct characters13
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEmergency
2nd rowTrauma
3rd rowTrauma
4th rowTrauma
5th rowTrauma
ValueCountFrequency (%)
Trauma217672
47.8%
Emergency168363
37.0%
Urgent69460
 
15.2%
2021-04-05T21:46:42.949219image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-04-05T21:46:43.030386image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
trauma217672
47.8%
emergency168363
37.0%
urgent69460
 
15.2%

Most occurring characters

ValueCountFrequency (%)
r455495
14.1%
a435344
13.4%
e406186
12.5%
m386035
11.9%
g237823
7.3%
n237823
7.3%
T217672
6.7%
u217672
6.7%
E168363
 
5.2%
c168363
 
5.2%
Other values (3)307283
9.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2782564
85.9%
Uppercase Letter455495
 
14.1%

Most frequent character per category

ValueCountFrequency (%)
r455495
16.4%
a435344
15.6%
e406186
14.6%
m386035
13.9%
g237823
8.5%
n237823
8.5%
u217672
7.8%
c168363
 
6.1%
y168363
 
6.1%
t69460
 
2.5%
ValueCountFrequency (%)
T217672
47.8%
E168363
37.0%
U69460
 
15.2%

Most occurring scripts

ValueCountFrequency (%)
Latin3238059
100.0%

Most frequent character per script

ValueCountFrequency (%)
r455495
14.1%
a435344
13.4%
e406186
12.5%
m386035
11.9%
g237823
7.3%
n237823
7.3%
T217672
6.7%
u217672
6.7%
E168363
 
5.2%
c168363
 
5.2%
Other values (3)307283
9.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII3238059
100.0%

Most frequent character per block

ValueCountFrequency (%)
r455495
14.1%
a435344
13.4%
e406186
12.5%
m386035
11.9%
g237823
7.3%
n237823
7.3%
T217672
6.7%
u217672
6.7%
E168363
 
5.2%
c168363
 
5.2%
Other values (3)307283
9.5%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.5 MiB
Moderate
251565 
Minor
122735 
Extreme
81195 

Length

Max length8
Median length8
Mean length7.013381047
Min length5

Characters and Unicode

Total characters3194560
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowExtreme
2nd rowExtreme
3rd rowExtreme
4th rowExtreme
5th rowExtreme
ValueCountFrequency (%)
Moderate251565
55.2%
Minor122735
26.9%
Extreme81195
 
17.8%
2021-04-05T21:46:43.234039image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-04-05T21:46:43.312005image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
moderate251565
55.2%
minor122735
26.9%
extreme81195
 
17.8%

Most occurring characters

ValueCountFrequency (%)
e665520
20.8%
r455495
14.3%
M374300
11.7%
o374300
11.7%
t332760
10.4%
d251565
 
7.9%
a251565
 
7.9%
i122735
 
3.8%
n122735
 
3.8%
E81195
 
2.5%
Other values (2)162390
 
5.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2739065
85.7%
Uppercase Letter455495
 
14.3%

Most frequent character per category

ValueCountFrequency (%)
e665520
24.3%
r455495
16.6%
o374300
13.7%
t332760
12.1%
d251565
 
9.2%
a251565
 
9.2%
i122735
 
4.5%
n122735
 
4.5%
x81195
 
3.0%
m81195
 
3.0%
ValueCountFrequency (%)
M374300
82.2%
E81195
 
17.8%

Most occurring scripts

ValueCountFrequency (%)
Latin3194560
100.0%

Most frequent character per script

ValueCountFrequency (%)
e665520
20.8%
r455495
14.3%
M374300
11.7%
o374300
11.7%
t332760
10.4%
d251565
 
7.9%
a251565
 
7.9%
i122735
 
3.8%
n122735
 
3.8%
E81195
 
2.5%
Other values (2)162390
 
5.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII3194560
100.0%

Most frequent character per block

ValueCountFrequency (%)
e665520
20.8%
r455495
14.3%
M374300
11.7%
o374300
11.7%
t332760
10.4%
d251565
 
7.9%
a251565
 
7.9%
i122735
 
3.8%
n122735
 
3.8%
E81195
 
2.5%
Other values (2)162390
 
5.1%

Visitors with Patient
Real number (ℝ≥0)

Distinct29
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.284229245
Minimum0
Maximum32
Zeros34
Zeros (%)< 0.1%
Memory size3.5 MiB
2021-04-05T21:46:43.577633image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q12
median3
Q34
95-th percentile6
Maximum32
Range32
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.768044196
Coefficient of variation (CV)0.538343722
Kurtosis21.82119341
Mean3.284229245
Median Absolute Deviation (MAD)1
Skewness3.225088709
Sum1495950
Variance3.125980278
MonotocityNot monotonic
2021-04-05T21:46:43.687014image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
2197734
43.4%
4113497
24.9%
384689
18.6%
627011
 
5.9%
513314
 
2.9%
86920
 
1.5%
73556
 
0.8%
91918
 
0.4%
11776
 
0.4%
101632
 
0.4%
Other values (19)3448
 
0.8%
ValueCountFrequency (%)
034
 
< 0.1%
11776
 
0.4%
2197734
43.4%
384689
18.6%
4113497
24.9%
ValueCountFrequency (%)
3212
 
< 0.1%
3026
 
< 0.1%
2910
 
< 0.1%
2512
 
< 0.1%
2499
< 0.1%

Age
Categorical

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.5 MiB
41-50
91495 
31-40
90420 
51-60
69506 
21-30
58560 
71-80
50737 
Other values (5)
94777 

Length

Max length6
Median length5
Mean length4.984120572
Min length4

Characters and Unicode

Total characters2270242
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row51-60
2nd row51-60
3rd row51-60
4th row51-60
5th row51-60
ValueCountFrequency (%)
41-5091495
20.1%
31-4090420
19.9%
51-6069506
15.3%
21-3058560
12.9%
71-8050737
11.1%
61-7048619
10.7%
11-2023871
 
5.2%
81-9011240
 
2.5%
0-109140
 
2.0%
91-1001907
 
0.4%
2021-04-05T21:46:43.930028image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-04-05T21:46:44.033310image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
41-5091495
20.1%
31-4090420
19.9%
51-6069506
15.3%
21-3058560
12.9%
71-8050737
11.1%
61-7048619
10.7%
11-2023871
 
5.2%
81-9011240
 
2.5%
0-109140
 
2.0%
91-1001907
 
0.4%

Most occurring characters

ValueCountFrequency (%)
1481273
21.2%
0466542
20.6%
-455495
20.1%
4181915
 
8.0%
5161001
 
7.1%
3148980
 
6.6%
6118125
 
5.2%
799356
 
4.4%
282431
 
3.6%
861977
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1814747
79.9%
Dash Punctuation455495
 
20.1%

Most frequent character per category

ValueCountFrequency (%)
1481273
26.5%
0466542
25.7%
4181915
 
10.0%
5161001
 
8.9%
3148980
 
8.2%
6118125
 
6.5%
799356
 
5.5%
282431
 
4.5%
861977
 
3.4%
913147
 
0.7%
ValueCountFrequency (%)
-455495
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2270242
100.0%

Most frequent character per script

ValueCountFrequency (%)
1481273
21.2%
0466542
20.6%
-455495
20.1%
4181915
 
8.0%
5161001
 
7.1%
3148980
 
6.6%
6118125
 
5.2%
799356
 
4.4%
282431
 
3.6%
861977
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII2270242
100.0%

Most frequent character per block

ValueCountFrequency (%)
1481273
21.2%
0466542
20.6%
-455495
20.1%
4181915
 
8.0%
5161001
 
7.1%
3148980
 
6.6%
6118125
 
5.2%
799356
 
4.4%
282431
 
3.6%
861977
 
2.7%

Admission_Deposit
Real number (ℝ≥0)

Distinct7634
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4877.434022
Minimum1800
Maximum11920
Zeros0
Zeros (%)0.0%
Memory size3.5 MiB
2021-04-05T21:46:44.210549image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1800
5-th percentile3360
Q14184
median4738
Q35405
95-th percentile6918
Maximum11920
Range10120
Interquartile range (IQR)1221

Descriptive statistics

Standard deviation1084.982089
Coefficient of variation (CV)0.2224493625
Kurtosis1.854782006
Mean4877.434022
Median Absolute Deviation (MAD)603
Skewness0.9313474371
Sum2221646810
Variance1177186.134
MonotocityNot monotonic
2021-04-05T21:46:44.332920image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4469445
 
0.1%
4277427
 
0.1%
4624408
 
0.1%
4789378
 
0.1%
4400340
 
0.1%
4807333
 
0.1%
4970332
 
0.1%
4465328
 
0.1%
4603309
 
0.1%
4579306
 
0.1%
Other values (7624)451889
99.2%
ValueCountFrequency (%)
18002
< 0.1%
18012
< 0.1%
18022
< 0.1%
18052
< 0.1%
18061
< 0.1%
ValueCountFrequency (%)
119201
 
< 0.1%
112931
 
< 0.1%
110084
< 0.1%
109992
< 0.1%
108421
 
< 0.1%

Stay
Categorical

HIGH CORRELATION
MISSING

Distinct11
Distinct (%)< 0.1%
Missing137057
Missing (%)30.1%
Memory size3.5 MiB
21-30
87491 
11-20
78139 
31-40
55159 
51-60
35018 
0-10
23604 
Other values (6)
39027 

Length

Max length18
Median length5
Mean length5.207387309
Min length4

Characters and Unicode

Total characters1658230
Distinct characters23
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0-10
2nd row41-50
3rd row31-40
4th row41-50
5th row41-50
ValueCountFrequency (%)
21-3087491
19.2%
11-2078139
17.2%
31-4055159
12.1%
51-6035018
 
7.7%
0-1023604
 
5.2%
41-5011743
 
2.6%
71-8010254
 
2.3%
More than 100 Days6683
 
1.5%
81-904838
 
1.1%
91-1002765
 
0.6%
(Missing)137057
30.1%
2021-04-05T21:46:44.554088image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
21-3087491
25.8%
11-2078139
23.1%
31-4055159
16.3%
51-6035018
10.3%
0-1023604
 
7.0%
41-5011743
 
3.5%
71-8010254
 
3.0%
days6683
 
2.0%
than6683
 
2.0%
1006683
 
2.0%
Other values (4)17030
 
5.0%

Most occurring characters

ValueCountFrequency (%)
1399342
24.1%
0351490
21.2%
-311755
18.8%
2165630
10.0%
3142650
 
8.6%
466902
 
4.0%
546761
 
2.8%
637762
 
2.3%
20049
 
1.2%
815092
 
0.9%
Other values (13)100797
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1246230
75.2%
Dash Punctuation311755
 
18.8%
Lowercase Letter66830
 
4.0%
Space Separator20049
 
1.2%
Uppercase Letter13366
 
0.8%

Most frequent character per category

ValueCountFrequency (%)
1399342
32.0%
0351490
28.2%
2165630
13.3%
3142650
 
11.4%
466902
 
5.4%
546761
 
3.8%
637762
 
3.0%
815092
 
1.2%
712998
 
1.0%
97603
 
0.6%
ValueCountFrequency (%)
a13366
20.0%
o6683
10.0%
r6683
10.0%
e6683
10.0%
t6683
10.0%
h6683
10.0%
n6683
10.0%
y6683
10.0%
s6683
10.0%
ValueCountFrequency (%)
M6683
50.0%
D6683
50.0%
ValueCountFrequency (%)
-311755
100.0%
ValueCountFrequency (%)
20049
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1578034
95.2%
Latin80196
 
4.8%

Most frequent character per script

ValueCountFrequency (%)
1399342
25.3%
0351490
22.3%
-311755
19.8%
2165630
10.5%
3142650
 
9.0%
466902
 
4.2%
546761
 
3.0%
637762
 
2.4%
20049
 
1.3%
815092
 
1.0%
Other values (2)20601
 
1.3%
ValueCountFrequency (%)
a13366
16.7%
M6683
8.3%
o6683
8.3%
r6683
8.3%
e6683
8.3%
t6683
8.3%
h6683
8.3%
n6683
8.3%
D6683
8.3%
y6683
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1658230
100.0%

Most frequent character per block

ValueCountFrequency (%)
1399342
24.1%
0351490
21.2%
-311755
18.8%
2165630
10.0%
3142650
 
8.6%
466902
 
4.0%
546761
 
2.8%
637762
 
2.3%
20049
 
1.2%
815092
 
0.9%
Other values (13)100797
 
6.1%

Type
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.5 MiB
Train
318438 
Test
137057 

Length

Max length5
Median length5
Mean length4.699103173
Min length4

Characters and Unicode

Total characters2140418
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTrain
2nd rowTrain
3rd rowTrain
4th rowTrain
5th rowTrain
ValueCountFrequency (%)
Train318438
69.9%
Test137057
30.1%
2021-04-05T21:46:44.779280image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-04-05T21:46:44.843290image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
train318438
69.9%
test137057
30.1%

Most occurring characters

ValueCountFrequency (%)
T455495
21.3%
r318438
14.9%
a318438
14.9%
i318438
14.9%
n318438
14.9%
e137057
 
6.4%
s137057
 
6.4%
t137057
 
6.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1684923
78.7%
Uppercase Letter455495
 
21.3%

Most frequent character per category

ValueCountFrequency (%)
r318438
18.9%
a318438
18.9%
i318438
18.9%
n318438
18.9%
e137057
8.1%
s137057
8.1%
t137057
8.1%
ValueCountFrequency (%)
T455495
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2140418
100.0%

Most frequent character per script

ValueCountFrequency (%)
T455495
21.3%
r318438
14.9%
a318438
14.9%
i318438
14.9%
n318438
14.9%
e137057
 
6.4%
s137057
 
6.4%
t137057
 
6.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII2140418
100.0%

Most frequent character per block

ValueCountFrequency (%)
T455495
21.3%
r318438
14.9%
a318438
14.9%
i318438
14.9%
n318438
14.9%
e137057
 
6.4%
s137057
 
6.4%
t137057
 
6.4%

Interactions

2021-04-05T21:46:14.399179image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:14.671201image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:14.951241image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:15.230341image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:15.510371image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:15.782384image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:16.086592image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:16.366450image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:16.630960image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:16.896803image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:17.178023image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:17.498044image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:17.762076image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:18.026102image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:18.290131image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:18.578341image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:18.842205image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:19.128352image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:19.392211image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:19.776253image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:20.064272image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:20.336300image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:20.616345image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:20.910619image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:21.198483image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:21.470496image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:21.734538image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:22.014558image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:22.278584image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:22.550627image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:22.822984image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:23.102676image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:23.358702image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:23.654734image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:23.934780image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:24.197179image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:24.467721image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:24.748945image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:25.020975image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:25.309004image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:25.581034image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:25.853064image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:26.117095image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:26.389128image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:26.653152image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:26.925197image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:27.205214image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:27.485243image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:27.749287image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:28.141318image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:28.405359image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:28.685377image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:28.951111image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:29.229714image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:29.501741image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:29.765770image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:30.037801image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:30.317844image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:30.581873image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:30.858022image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:31.124376image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:31.393576image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:31.666759image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:32.034406image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:32.306452image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:32.578489image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:32.842507image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:33.114523image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:33.378552image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:33.651167image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:33.915182image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-04-05T21:46:34.202005image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Correlations

2021-04-05T21:46:44.923297image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-04-05T21:46:45.123336image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-04-05T21:46:45.306296image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-04-05T21:46:45.506316image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-04-05T21:46:45.770366image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-04-05T21:46:35.090426image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-04-05T21:46:36.162229image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-04-05T21:46:37.891807image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-04-05T21:46:38.339632image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexcase_idHospital_codeHospital_type_codeCity_Code_HospitalHospital_region_codeAvailable Extra Rooms in HospitalDepartmentWard_TypeWard_Facility_CodeBed GradepatientidCity_Code_PatientType of AdmissionSeverity of IllnessVisitors with PatientAgeAdmission_DepositStayType
0018c3Z3radiotherapyRF2.0313977.0EmergencyExtreme251-604911.00-10Train
1122c5Z2radiotherapySF2.0313977.0TraumaExtreme251-605954.041-50Train
22310e1X2anesthesiaSE2.0313977.0TraumaExtreme251-604745.031-40Train
33426b2Y2radiotherapyRD2.0313977.0TraumaExtreme251-607272.041-50Train
44526b2Y2radiotherapySD2.0313977.0TraumaExtreme251-605558.041-50Train
55623a6X2anesthesiaSF2.0313977.0TraumaExtreme251-604449.011-20Train
66732f9Y1radiotherapySB3.0313977.0EmergencyExtreme251-606167.00-10Train
77823a6X4radiotherapyQF3.0313977.0TraumaExtreme251-605571.041-50Train
8891d10Y2gynecologyRB4.0313977.0TraumaExtreme251-607223.051-60Train
991010e1X2gynecologySE3.0313977.0TraumaExtreme251-606056.031-40Train

Last rows

df_indexcase_idHospital_codeHospital_type_codeCity_Code_HospitalHospital_region_codeAvailable Extra Rooms in HospitalDepartmentWard_TypeWard_Facility_CodeBed GradepatientidCity_Code_PatientType of AdmissionSeverity of IllnessVisitors with PatientAgeAdmission_DepositStayType
4554851370474554869d5Z2gynecologySF4.0552354.0EmergencyModerate331-404418.0NaNTest
45548613704845548713a5Z2gynecologyRF3.0552354.0EmergencyModerate231-403816.0NaNTest
45548713704945548812a9Y6gynecologyQB2.0995157.0EmergencyModerate461-704406.0NaNTest
45548813705045548913a5Z3gynecologyRF3.02287821.0EmergencyModerate321-304573.0NaNTest
45548913705145549015c5Z2gynecologySF4.01182156.0UrgentMinor221-305241.0NaNTest
45549013705245549111b2Y4anesthesiaQD3.0411603.0EmergencyMinor441-506313.0NaNTest
45549113705345549225e1X2radiotherapyRE4.0309857.0EmergencyModerate20-103510.0NaNTest
45549213705445549330c3Z2anesthesiaRA4.08181112.0UrgentMinor20-107190.0NaNTest
4554931370554554945a1X2anesthesiaRE4.05702110.0TraumaMinor241-505435.0NaNTest
4554941370564554956a6X3gynecologyQF4.01267293.0TraumaExtreme551-604702.0NaNTest